Data Quality Improvement in the DaQuinCIS System
نویسندگان
چکیده
Data quality improvement is becoming an increasingly important issue. In contexts where data are replicated among different sources, data quality improvement is possible through extensive data comparisons: whereas copies of same data are different because of data errors, comparisons help to reconcile such copies. Best quality copies can be selected or constructed in order to correct other copies. Record matching algorithms can support the task of linking different copies of the same data in order to engage reconciliation activities; for instance, a periodical running of record matching algorithms can be performed in order to reconcile copies with different quality. Nevertheless, the extensive running of such algorithms is typically performed in fixed instants. This allows for periods in which the quality of data can deteriorate, while no quality improvement action is performed on data. In this paper, we describe the DaQuinCIS platform for data quality improvement in contexts where data are replicated among heterogeneous and distributed sources. The quality improvement strategy underlying the proposed platform complements a periodical record matching activity with an “on-line” quality improvement, performed at query processing time. We experimentally show the feasibility and effectiveness of our approach by applying it to real databases; we also quantitatively evaluate the efficiency of our system.
منابع مشابه
Peer-to-Peer Data Quality Improvement in the DaQuinCIS System
Data quality improvement is becoming an increasingly important issue. In contexts where data are replicated among different sources, data quality improvement is possible through extensive data comparisons: whereas copies of same data are different because of data errors, comparisons help to reconcile such copies. Record matching algorithms can support the task of linking different copies of the...
متن کاملThe DaQuinCIS Broker: Querying Data and Their Quality in Cooperative Information Systems
In cooperative information systems, the quality of data exchanged and provided by different data sources is extremely important. A lack of attention to data quality can imply data of low quality to spread all over the cooperative system. At the same time, improvement can be based on comparing data, correcting them and disseminating high quality data. In this paper, a framework and a related arc...
متن کاملData Quality in Cooperative Information Systems
A Cooperative Information System (CIS) is a largescale information system that interconnects various systems of different and autonomous organizations, geographically distributed and sharing common objectives (De Michelis et al., 1997). Among the different resources that are shared by organizations, data are fundamental; in real world scenarios, organization A may not request data from organiza...
متن کاملThe DaQuinCIS Architecture: a Platform for Exchanging and Improving Data Quality in Cooperative Information Systems ?
In cooperative information systems, the quality of data exchanged and provided by different data sources is extremely important. A lack of attention to data quality can imply data of low quality to spread all over the cooperative system. At the same time, improvement can be based on comparing data, correcting them and thus disseminating high quality data. In this paper, we present an architectu...
متن کاملEnabling Data Quality Notification in Cooperative Information Systems through aWeb-Service Based Architecture
Cooperative Information Systems (CISs) are often characterized by a high degree of data replication; as an example, in an e-Government scenario, the personal data of citizens are stored by almost all administrations. In such scenarios, organizations typically provide the same information with distinct quality levels and this enables providing users with data of the highest available quality. Fu...
متن کامل